-
Notifications
You must be signed in to change notification settings - Fork 14.1k
perf: remove loop from str::floor_char_boundary
#149466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Rather than manually unrolling the code here, it might be worth considering if the original implementation can be made const instead: rust/library/core/src/str/mod.rs Lines 244 to 247 in 03b17b1
I had this exact thought in the original implementation, which is why I explicitly only checked the four following bytes for a character boundary. Note that I haven't fully scrutinised the code and yours may be better regardless, but I figured I'd mention this for some extra context. I would expect LLVM to automatically unroll a 4-max-iteration loop in most contexts. |
| let bytes = self.as_bytes(); | ||
| let boundary_index = if bytes[index].is_utf8_char_boundary() { | ||
| index | ||
| } else { | ||
| let mut i = index; | ||
| while i > 0 { | ||
| if self.as_bytes()[i].is_utf8_char_boundary() { | ||
| break; | ||
| // SAFETY: `bytes[index]` is a UTF-8 continuation byte, therefore there must be a byte before it. | ||
| // Note: `index.unchecked_sub(1)` would be preferable, but it doesn't the remove bounds check | ||
| // from `bytes[previous_index]`. | ||
| let previous_index = unsafe { index.checked_sub(1).unwrap_unchecked() }; | ||
| if bytes[previous_index].is_utf8_char_boundary() { | ||
| previous_index | ||
| } else { | ||
| // SAFETY: `bytes[index - 1]` is a UTF-8 continuation byte, therefore there must be a byte before it | ||
| let previous_previous_index = unsafe { index.checked_sub(2).unwrap_unchecked() }; | ||
| if bytes[previous_previous_index].is_utf8_char_boundary() { | ||
| previous_previous_index | ||
| } else { | ||
| // UTF-8 character sequences are at most 4 bytes long. | ||
| // `bytes[index - 2]`, `bytes[index - 1]`, and `bytes[index]` are all continuation bytes, | ||
| // therefore `bytes[index - 3]` must be the 1st byte in a 4-byte UTF-8 character sequence. | ||
| debug_assert!(bytes[index - 3].is_utf8_char_boundary()); | ||
| index - 3 | ||
| } | ||
| i -= 1; | ||
| } | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deep nesting isn't ideal. You could try matching slice patterns:
let bytes = &self.as_bytes()[..=index];
let boundary_index = match bytes {
[.., b] if b.is_utf8_char_boundary() => index,
[.., b, _] if b.is_utf8_char_boundary() => index - 1,
[.., b, _, _] if b.is_utf8_char_boundary() => index - 2,
[.., b, _, _, _] if b.is_utf8_char_boundary() => index - 3,
// SAFETY: TODO
_ => unsafe {
debug_assert!(bytes[index.saturating_sub(3)].is_utf8_char_boundary());
unreachable_unchecked()
}
};There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much nicer. I'll check if it optimizes equally as well.
|
You might want to add a codegen test under At the time, I tried to add a counter variable to limit the number of iterations but I wasn't able to remove the panics like the current implementation does.
I wrote an example codegen test to help with testing this out (others may be able to improve it further). Example codegen test file
// Test that panics and slice errors are elided from `floor_char_boundary` implementation.
//@ compile-flags: -Copt-level=3
#![crate_type = "lib"]
// CHECK-LABEL: @floor_fixed_index
#[no_mangle]
pub fn floor_fixed_index(s: &str) -> usize {
// CHECK-NOT: panic
s.floor_char_boundary(20)
}
// CHECK-LABEL: @floor_variable_index
#[no_mangle]
pub fn floor_variable_index(s: &str, i: usize) -> usize {
// CHECK-NOT: panic
s.floor_char_boundary(i)
}
// CHECK-LABEL: @floor_slice_fixed_index
#[no_mangle]
pub fn floor_slice_fixed_index(s: &str) -> &str {
// CHECK-NOT: panic
// CHECK-NOT: slice_error_fail
let index = s.floor_char_boundary(20);
&s[..index]
}
// CHECK-LABEL: @floor_slice_variable_index
#[no_mangle]
pub fn floor_slice_variable_index(s: &str, i: usize) -> &str {
// CHECK-NOT: panic
// CHECK-NOT: slice_error_fail
let index = s.floor_char_boundary(i);
&s[..index]
}
// CHECK-LABEL: @truncate_from_floor_fixed_index
#[no_mangle]
pub fn truncate_from_floor_fixed_index(s: &mut String) {
// CHECK-NOT: panic
let index = s.floor_char_boundary_new(20);
s.truncate(index)
}
// CHECK-LABEL: @truncate_from_floor_variable_index
#[no_mangle]
pub fn truncate_from_floor_variable_index(s: &mut String, i: usize) {
// CHECK-NOT: panic
let index = s.floor_char_boundary_new(i);
s.truncate(index)
} |
Yes, I noticed this today. Am hoping to eliminate all these panics. I feel like it should be possible, but... mysteries of the compiler... |
#144472 made
str::floor_char_boundarya const function, but in doing so introduced a loop. This is unnecessary because the next UTF-8 character boundary can only be within a 4-byte range.This produces excessive code for e.g.
str.floor_char_boundary(20), because the loop is unrolled.https://godbolt.org/z/5f3YsM6oK
This PR replaces the loop with 3 checks in series.
In addition to reducing code size in some cases, it also removes bounds checks from calling code when following
floor_char_boundarywith a call toString::truncate(which I assume might be a common pattern).Notes
I tried using
index.unchecked_sub(1), but found it doesn't remove bounds checks, whereasindex.checked_sub(1).unwrap_unchecked()does. Surprising!The
assert_uncheckeds are required to elide checks from code following thefloor_char_boundarycall e.g.:If this PR is accepted, I'll follow up with a similar PR for
ceil_char_boundary.Very happy to receive feedback and amend.